Towards a Spreadsheet Engineering
نویسندگان
چکیده
In this paper, we report some on-going focused research, but are further keen to set it in the context of a proposed bigger picture, as follows. There is a certain depressing pattern about the attitude of industry to spreadsheet error research and a certain pattern about conferences highlighting these issues. Is it not high time to move on from measuring spreadsheet errors to developing an armoury of disciplines and controls? In short, we propose the need to rigorously lay the foundations of a spreadsheet engineering discipline. Clearly, multiple research teams would be required to tackle such a big task. This suggests the need for both national and international collaborative research, since any given group can only address a small segment of the whole. There are already a small number of examples of such on-going international collaborative research. Having established the need for a directed research effort, the rest of the paper then attempts to act as an exemplar in demonstrating and applying this focus. With regard to one such of research, in a recent paper, Panko (2005) stated that: “...group development and testing appear to be promising areas to pursue.” Of particular interest to us are some gaps in the published research record on techniques to reduce errors. We further report on the topics: techniques for cross-checking, time constraints effects, and some aspects of developer perception. 1. SPREADSHEET ENGINEERING Given the fact that spreadsheet modellers are not IS professionals, there has been significant effort to adapt existing software engineering principles to form a spreadsheet engineering discipline more sympathetic to spreadsheet modellers (Burnett et al. 2001, Burnett et al. 2003, Burnett et al 2004, Grossman 2002, Grossman and Ozluk 2004, Panko 2006, Nash and Goldberg 2005, Rajalingham et al. 2000). Some offer ‘best practice’ guidelines (Grossman 2002, Read and Batson, 1999, O’Beirne 2005) whilst others seek to develop a framework for spreadsheet engineering (Grossman and Ozluk 2004, Burnett et al. 2003, Burnett et al 2004, Rajalingham et al. 2000) or develop specific elements in a software lifecycle, such as testing (Panko 2006, Pryor 2004, Nash and Goldberg 2005, Yirsaw 2003) Best practice guidelines in spreadsheets have proved difficult to settle on. Colver (2004) advocates that ‘best practice’ in spreadsheets is impossible to attain since adopting one approach to spreadsheet development often has negative side effects on the positive aspects of spreadsheet technology such as flexibility and speed of development. Other authors disagree, Read and Batson (1999) produced a detailed paper on spreadsheet best practice for organisations. This paper describes best practice from a systems development lifecycle approach, detailing best practices for planning, design, building, testing, maintenance and evaluation. This is compiled by the authors from years of experience gathered in Price Waterhouse Coopers (PWC). The actual best practice comes in the form of advice and guidelines for carrying out specific tasks as well as encouraging the reader to practice more general best practice, for example identifying stakeholders in the spreadsheet and conducting user acceptance testing respectively. Grossman (2002) presents eight best practice principles based upon literature from spreadsheet modelling and a number of other related disciplines. Grossman highly recommends adopting best practice and presents evidence that doing so can significantly reduce error. O’Beirne (2005) draws from extensive experience to provide best practice in spreadsheets. The guidance offered comes in the form of both general, such as following a format when setting a spreadsheet up, and specific recommendations such as ensuring cell protection on cells with formulae. 1.1 Framework for spreadsheet engineering Attempts have been made to modify and adapt frameworks in software engineering to substantiate spreadsheet engineering. Burnett et al. (2003) describes end user software engineering in the spreadsheet paradigm using assertions for debugging spreadsheets. It was discovered that the assertions helped the end users debug the spreadsheets, they caught more errors. Further, the participants routinely understood what the assertions meant and actually liked having them as a guide. This debugging was presented in the wider context of an iterative end user development life cycle. Burnett et al. (2004) argues that since spreadsheet modellers are not IS professionals, it is more practical to employ a smaller feedback loop rather than provide a comprehensive traditional SDLC based methodology. The feedback loop incorporates the following: Interactive testing (testing while the user is modelling); Fault localisation (tool for locating faults after testing); Interactive assertions (monitoring values in the spreadsheet and alerting users to potential discrepancies) and motivational devices (Gets the user to participate in software engineering methods). Grossman and Ozluk (2004) extend previous work on spreadsheet engineering principles, to give a more traditional adaptation of the SDLC and moves away from a best practice approach. This fresh approach gives consideration to the actual use of the final spreadsheet and recommends incorporating users into the development and holding a review of use after implementation. 1.2 Evidence of Spreadsheet Errors Human errors are very common and inevitable, (Panko, 2005). Human beings commit errors in every walk of life. It is the very internal nature of human beings and is very difficult to change. But, what can be changed is the external nature. The idea is to modify the external factors to cope with the erring nature of humans, and thereby, improve the accuracy of spreadsheets. ‘External factors’ mean strategies or approaches incorporated by the organisations to maintain the quality of spreadsheets. To begin with, we mention some evidences of spreadsheet errors and the steps taken by some researchers to enhance the quality of spreadsheets. Spreadsheet models are very widely used and are very likely to contain errors (Panko and Halverson, 2001). Following are some recent evidences of the occurrence of spreadsheet errors. A simple spreadsheet error (cut-and-paste) cost a firm a whopping US$24m. The mistake led to TransAlta, a big Canadian power generator, buying more US power transmission hedging contracts at higher prices than it should have. (Cullen, 2003) A US government audit says the Columbia Housing Authority has to pay $216,352 to cover expenses incurred as it gave some Section 8 tenants too much room and landlords excess rent. Phil Steinhaus, the housing authority’s CEO, asked that the fees for over-housing be waived but agreed to pay $118,387, the amount that resulted from a spreadsheet data-entry error that overpaid landlords. (Miller, 2006) A chaotic situation in the posting of minimum bid prices for the first phase of North Port's abandoned lot auction led to confusion as the cost of some lots seemingly tripled overnight. In a rush to make the prices available to public before Christmas, the appraiser hired by the county put the auction lot number, the property ID number and the minimum bid amount onto a spreadsheet in sequential order but, inadvertently, did not sort the value column. (Venice Gondolier Sun, 2006) Eastman Kodak Co. added $9m to its big third-quarter loss, to correct its several accounting errors. The adjustments reflect restructuring and severance costs linked to its ongoing effort to turn itself into a digital photography business. A Kodak spokesman said an $11m severance error was traced to a faulty spreadsheet and there were too many zeros added to the employee's accrued severance. But it was an accrual. There was never a payment. (Jelter, 2005) A miscalculation in a spreadsheet almost cost Chi Omega sorority first place in the Homecoming competition. Katie Gonsoulin, Homecoming Committee chairperson, said the error occurred when the formula used to calculate scores from Homecoming Week events left two scores out of the tabulation. The resulting scores announced at the Homecoming game were incorrect. (Beagle, 2004) Westpac had to halt trading on its shares and deliver its annual profit briefing a day early, after it accidentally emailed its results to research analysts. Deta ils of the $2.818bn record annual profit result, which were due to be announced, were overshadowed by concerns of some information being leaked into market. The new figures were embedded in a template of last year's results and were accessible with minor manipulation of the spreadsheet. Chief financial officer, Philip Chronican, said it was not just one error, but a compounding of 2 or 3 errors. (Knight, 2005) 1.3 Approaches by Other Researchers Rajalingham et al (2000) proposed an approach, the significant feature of which is that it adopts concepts from software engineering and employs important principles and techniques such as a unique definition of spreadsheet model elements (chiefly labels, data values and formulae), hierarchical representation of a formula in tree form, and separation of data (user-entered data values) and operations (formulae that operate on them). Berge et al (2005) worked on a project to help end-users to locate and prevent, principally, mistyping and other human errors. Their implementation gives an option to visualize dependencies (represented by arrows) between cells in the spreadsheet to help the user see any inconsistencies in references between cells. Also, they implemented a way to assign a type to a cell which warns the user when a faulty type is entered. Further, they have a tool which visualizes the types and gives a better overview of the types in the spreadsheet. UML diagrams (Use-cases, Class diagrams and Interaction diagrams) were used in the requirements planning and design phases of this project. Aiming to facilitate analysis and comprehension of the different types of spreadsheet errors and to clearly understand the characteristics of an error as well as the nature of its occurrence, Rajalingham et al (2000) came up with a classification or taxonomy of errors. This is an outcome of a thorough investigation of the widespread problem of spreadsheet errors and an analysis of specific types of these errors. It also enables users to gain a better understanding of the different types of errors that can occur in their spreadsheet models. Appropriate tools, techniques and methods can subsequently be developed to prevent their occurrence in the first place or enhance the chances of detecting these errors after they have occurred. In addition to that, when a new specific type of error is identified, it can be placed in the appropriate category within the taxonomy. In the process of classifying the error, spreadsheet developers and end-users are bound to gain a much deeper understanding of the error. This is because they are forced to examine and compare its characteristics with those of other spreadsheet errors. Another important strategy is ‘code inspection’. Panko (1999 cited Panko 2005) found that team code inspection allowed undergraduate MIS majors to find 83% of all seeded errors in a spreadsheet, although the group did not find errors not previously found by the members of the team, who had inspected it alone before the group code inspection. Panko’s study was centred on ‘tetrads’ to detect errors seeded in spreadsheets already designed. 1.4 Our approach to group work Contrastingly, our study as discussed below, is centred on working in ‘pairs’ to cross-check the overall work done individually. Our study also addresses several other aspects of spreadsheets with regard to design, implementation and testing: namely modelling, determining the appropriate formula to solve the problem, entering data into the cells and presenting the data. A novel aspect of our study is that ‘dyads’ cross-checking their work could find errors unidentified when they worked on their own. Usefully, employees’ perceptions on group work and on working in pairs to cross-check their work were also reported. This study was based on an assumption from the evidences of spreadsheet errors that some errors might have been committed either in a hurry or due to lack of time to cross-check with others. Also, some errors could have probably been avoided if they had taken time and/or cross-checked with others. The following experiments were conducted: 1. Assessing the usefulness of cross-checking to improve spreadsheet accuracy. 2. Evaluating the benefits of group work and comparing it with the cross-check approach. 3. Examining the effects of time constraints on spreadsheet accuracy. Surveys of spreadsheet developers (Panko, 2005) indicate that spreadsheet creation, in contrast, is informal, and few organizations have comprehensive policies for spreadsheet development. Further, as we have seen, there are diverse approaches like legal policies, software engineering and development techniques, group work and other strategies proposed by various researchers. However, the seriousness of spreadsheet errors justifies the necessity of varied approaches to enhance the spreadsheet quality. As with any true engineering discipline, spreadsheet engineering looks set to require numerous and distinct strategies to encompass such a troubling issue. These three experiments are related to development, implementation and testing aspects of Spreadsheet Engineering.
منابع مشابه
Quality Control in Spreadsheets: A Software Engineering-based Approach to Spreadsheet Development
This paper presents work conducted towards the development of an effective software engineering-based methodology for integrity control in the process of spreadsheet development. Various approaches and concepts within the discipline of software engineering are investigated. The proposed methodology consists of a set of coherent stages incorporating relevant software engineering techniques and p...
متن کاملDeveloping Visualisations For Spreadsheet Formulae: Towards Increasing the Accessibility of Science, Technology, Engineering and Maths Subjects
Spreadsheets are widely used within Science, technology, engineering and maths education. Despite their widespread use, end-user spreadsheet errors are still extremely common and have been shown to have an adverse effect on learning. The textual representation of formulas can be particularly complex and error-prone, exacerbating barriers to dyslexic users. Our work focuses on the design and dev...
متن کاملTowards Evaluating the Quality of a Spreadsheet: The Case of the Analytical Spreadsheet Model
We consider the challenge of creating guidelines to evaluate the quality of a spreadsheet model. We suggest four principles. First, state the domain—the spreadsheets to which the guidelines apply. Second, distinguish between the process by which a spreadsheet is constructed from the resulting spreadsheet artifact. Third, guidelines should be written in terms of the artifact, independent of the ...
متن کاملTowards a New Model of Abstraction in Software Engineering
ion In Action I want to start by talking about the current view of abstraction in software engineering: how we use it, what the principles are, what the terminology is and what it does for us. Rather than attempting any sort of formal definition, I will just use an example. I will talk about the implementation of a familiar system, using familiar terms of abstraction, with the goal of getting t...
متن کاملA Paradigm for Spreadsheet Engineering Methodologies
Spreadsheet engineering methodologies are diverse and sometimes contradictory. It is difficult for spreadsheet developers to identify a spreadsheet engineering methodology that is appropriate for their class of spreadsheet, with its unique combination of goals, type of problem, and available time and resources. There is a lack of well-organized, proven methodologies with known costs and benefit...
متن کاملSpreadsheet Engineering: A Research Framework
Spreadsheet engineering adapts the lessons of software engineering to spreadsheets, providing eight principles as a framework for organizing spreadsheet programming recommendations. Spreadsheets raise issues inadequately addressed by software engineering. Spreadsheets are a powerful modeling language, allowing strategic rapid model change, and enabling exploratory modeling. Spreadsheets users l...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/0803.0159 شماره
صفحات -
تاریخ انتشار 2006